Sejumlah library penting yang digunakan dalam project kali ini diantaranya adalah:
# Import library penting
from sklearn.cluster import KMeans
import pandas as pd
from sklearn.preprocessing import MinMaxScaler, StandardScaler, RobustScaler
from sklearn.decomposition import PCA
from matplotlib import pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import plotly.express as px
%matplotlib inline
Dalam hal ini, dataset yang digunakan berformat Excel. Dataset berformat Excel tersebut nantinya diubah dalam bentuk pandas dataframe. Kemudian, cek masing-masing kolom apakah ada baris yang bernilai null atau tidak. Deskripsi statistik dari dataset mencakup count, mean, standar deviasi (std), 3std, dll. Deskripsi statistik dataset bertujuan untuk mengetahui gambaran statistik dari dataset termasuk melihat adanya potensi outlier dari dataset.
# Buka dataset
df = pd.read_excel("..\excel\Cluster_Analysis.xlsx")
# Atur kolom "Well No." sebagai indeks kolom
df = df.set_index("Well No.")
# Tampilkan
display(df.head(10))
display(df.info())
| Easting (ft) | Northing (ft) | X1 | X2 | X3 | X4 | |
|---|---|---|---|---|---|---|
| Well No. | ||||||
| Well 1 | 12561 | 8558 | 1325.60 | 5.533 | 405 | 35 |
| Well 2 | 11955 | 10140 | 341.90 | 3.508 | 430 | 40 |
| Well 3 | 9527 | 8273 | 4514.50 | 8.503 | 170 | 48 |
| Well 4 | 9845 | 8810 | 3444.10 | 6.034 | 185 | 35 |
| Well 5 | 9165 | 10140 | 86.00 | 1.929 | 185 | 32 |
| Well 6 | 9770 | 6170 | 3520.31 | 7.051 | 385 | 47 |
| Well 7 | 12485 | 6170 | 853.80 | 5.454 | 100 | 31 |
| Well 8 | 7205 | 8810 | 2188.70 | 4.715 | 50 | 27 |
| Well 9 | 6525 | 10141 | 7.40 | 0.582 | 10 | 21 |
| Well 10 | 4565 | 8810 | 4720.50 | 6.908 | 64 | 38 |
<class 'pandas.core.frame.DataFrame'> Index: 48 entries, Well 1 to Well 48 Data columns (total 6 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Easting (ft) 48 non-null int64 1 Northing (ft) 48 non-null int64 2 X1 48 non-null float64 3 X2 48 non-null float64 4 X3 48 non-null int64 5 X4 48 non-null int64 dtypes: float64(2), int64(4) memory usage: 2.6+ KB
None
Deskripsi statistik dari dataset mencakup count, mean, standar deviasi (std), 3std, dll. Deskripsi statistik dataset bertujuan untuk mengetahui gambaran statistik dari dataset termasuk melihat adanya potensi outlier dari dataset dengan menggunakan perbandingan nilai max min terhadap nilai +-3 x STD metric, apakah di luar jangkauan +-3 x STD metric atau tidak.
# Membuat deskripsi statistik
desc_df = df.describe()
# Tambahkan metriks standar deviasi
desc_df.loc['+3_std'] = desc_df.loc['mean'] + (desc_df.loc['std'] * 3)
desc_df.loc['-3_std'] = desc_df.loc['mean'] - (desc_df.loc['std'] * 3)
# Tampilkan
desc_df
| Easting (ft) | Northing (ft) | X1 | X2 | X3 | X4 | |
|---|---|---|---|---|---|---|
| count | 48.000000 | 48.000000 | 48.000000 | 48.000000 | 48.000000 | 48.000000 |
| mean | 11503.979167 | 5214.354167 | 6818.245208 | 6.907625 | 763.166667 | 42.312500 |
| std | 4511.728699 | 3143.363854 | 9139.622879 | 3.150359 | 715.889607 | 12.708108 |
| min | 3205.000000 | 225.000000 | 7.400000 | 0.582000 | 10.000000 | 13.000000 |
| 25% | 8183.750000 | 2292.000000 | 1002.450000 | 4.701250 | 181.250000 | 33.000000 |
| 50% | 11475.000000 | 4860.000000 | 3482.205000 | 6.888000 | 402.500000 | 44.500000 |
| 75% | 15227.500000 | 8344.250000 | 7743.625000 | 9.313000 | 1325.000000 | 51.000000 |
| max | 18405.000000 | 10141.000000 | 36347.400000 | 13.073000 | 2800.000000 | 69.000000 |
| +3_std | 25039.165263 | 14644.445729 | 34237.113845 | 16.358701 | 2910.835487 | 80.436824 |
| -3_std | -2031.206929 | -4215.737395 | -20600.623428 | -2.543451 | -1384.502154 | 4.188176 |
Dari deskripsi statistik tersebut, secara umum dataset tidak memiliki potensi outlier.
# Visualisasi 4 variabel (X1, X2, X3, X4) dalam grafik 2D
fig1 = px.scatter(df, x = "X1", y = "X2", color = "X3", size = "X4")
fig1.update_layout(title = "4 variabel dalam grafik 2D")
fig1.show()
# Visualisasi 4 variabel dalam grafik 3D
fig2 = px.scatter_3d(df, x = "X1", y = "X2", z = "X3", color = "X4")
fig2.update_layout(title = "4 variabel dalam grafik 3D")
fig2.show()
Dari grafik 2D, Sumbu X menunjukkan X1, sumbu Y menunjukkan X2, skala warna menunjukkan X3, dan ukuran point menunjukkan X4. Kita bisa melihat adanya korelasi positif non-linier antara X1 dan X2. Nilai X1 dan X2 yang semakin besar juga berbanding lurus dengan nilai X3 yang ditunjukkan dengan warna yang bergradasi dari biru tua menjadi kuning dan nilai X4 yang ditunjukkan dengan ukuran point yang relatif semakin besar.
Tahap selanjutnya yakni melakukan normalisasi pada dataset. Masing-masing variabel/fitur tentunya memiliki unit satuan yang berbeda sehingga perlu dilakukan standarisasi/normalisasi. Untuk melakukan normalisasi, ada 3 opsi yang dapat dilakukan:
Untuk mengetahui lebih lanjut tentang proses standarisasi menggunakan 3 opsi tersebut, dapat menelusuri link berikut: link. Karena dataset tidak memiliki potensi outlier (berdasarkan deskripsi statistik dataset), maka digunakan metode Min Max Scaler.
min_max_scaler = MinMaxScaler()
# Normalisasi dataset
min_max_scaler.fit(df[['X1']])
df['X1'] = min_max_scaler.transform(df[['X1']])
min_max_scaler.fit(df[['X2']])
df['X2'] = min_max_scaler.transform(df[['X2']])
min_max_scaler.fit(df[['X3']])
df['X3'] = min_max_scaler.transform(df[['X3']])
min_max_scaler.fit(df[['X4']])
df['X4'] = min_max_scaler.transform(df[['X4']])
# Tampilkan
df.head(10)
| Easting (ft) | Northing (ft) | X1 | X2 | X3 | X4 | |
|---|---|---|---|---|---|---|
| Well No. | ||||||
| Well 1 | 12561 | 8558 | 0.036274 | 0.396365 | 0.141577 | 0.392857 |
| Well 2 | 11955 | 10140 | 0.009205 | 0.234249 | 0.150538 | 0.482143 |
| Well 3 | 9527 | 8273 | 0.124026 | 0.634137 | 0.057348 | 0.625000 |
| Well 4 | 9845 | 8810 | 0.094571 | 0.436474 | 0.062724 | 0.392857 |
| Well 5 | 9165 | 10140 | 0.002163 | 0.107838 | 0.062724 | 0.339286 |
| Well 6 | 9770 | 6170 | 0.096668 | 0.517893 | 0.134409 | 0.607143 |
| Well 7 | 12485 | 6170 | 0.023291 | 0.390041 | 0.032258 | 0.321429 |
| Well 8 | 7205 | 8810 | 0.060025 | 0.330878 | 0.014337 | 0.250000 |
| Well 9 | 6525 | 10141 | 0.000000 | 0.000000 | 0.000000 | 0.142857 |
| Well 10 | 4565 | 8810 | 0.129695 | 0.506445 | 0.019355 | 0.446429 |
# Visualisasi 4 variabel (X1, X2, X3, X4) dalam grafik 2D
fig3 = px.scatter(df, x = "X1", y = "X2", color = "X3", size = "X4")
fig3.update_layout(title = "4 variabel dalam grafik 2D")
fig3.show()
Secara sederhana, Analisis Komponen Utama (Principal Component Analysis) adalah analisis untuk mereduksi sejumlah variabel yang banyak dari suatu dataset sehingga mempunyai dimensi yang lebih kecil namun dapat menerangkan sebagian besar keragaman variabel aslinya. Dengan melakukan PCA, maka pemrosesan model lebih cepat tanpa mengurangi keragaman variabel aslinya. Untuk informasi lebih lanjut tentang PCA, dapat mengunjungi link berikut: link
X_train_minmax = min_max_scaler.fit_transform(df[['X1', 'X2', 'X3', 'X4']])
X_train_minmax
# Input sebagai PCA class object
pca = PCA().fit(X_train_minmax)
# Plot penjumlahan kumulatif dari Explained Variance
plt.figure()
plt.plot(np.cumsum(pca.explained_variance_ratio_))
# Definisikan label dan judul
plt.xlabel('Jumlah Komponen', fontsize = 15)
plt.ylabel('Variance (%)', fontsize = 15)
plt.title('Explained Variance', fontsize = 20)
plt.grid(b = None, which = 'Major', axis = 'both')
# Tampilkan
plt.show()
Dari grafik tersebut, dapat dilihat bahwa 100% variance dapat dijelaskan hanya dengan 3 komponen. Maka, untuk tahap k-means clustering, 3 komponen tersebut sebagai input.
# Membuat dataset PCA
pca_dataset = PCA(n_components = 3).fit(X_train_minmax).transform(X_train_minmax)
# Simpan dalam dataframe yang baru
pca_dataset = pd.DataFrame(data = pca_dataset, columns = ['principal component 1', 'principal component 2', 'principal component 3'])
# Tampilkan
pca_dataset.head(10)
| principal component 1 | principal component 2 | principal component 3 | |
|---|---|---|---|
| 0 | -0.257410 | 0.010241 | -0.014850 |
| 1 | -0.319013 | -0.005766 | -0.066692 |
| 2 | -0.013475 | 0.264365 | -0.042363 |
| 3 | -0.243245 | 0.086584 | 0.054877 |
| 4 | -0.501030 | -0.042296 | 0.031679 |
| 5 | -0.064880 | 0.156129 | -0.071152 |
| 6 | -0.351643 | 0.063840 | 0.042337 |
| 7 | -0.408143 | 0.024900 | 0.116175 |
| 8 | -0.682235 | -0.119085 | 0.148956 |
| 9 | -0.182367 | 0.169381 | 0.067104 |
# Visualisasi 3 principal components dalam grafik 3D
fig5 = px.scatter_3d(pca_dataset, x = "principal component 1", y = "principal component 2", z = "principal component 3")
fig5.update_layout(title = "3 Komponen PCA")
fig5.show()
fig6 = px.scatter(pca_dataset, x = "principal component 1", y = "principal component 2", color = "principal component 3")
fig6.update_layout(title = "3 Komponen PCA")
fig6.show()
# Buat list index untuk pca_dataset
index = []
for i in range(1, 49):
index.append('Well ' + str(i))
# Tambahkan kolom baru 'Well No.'
pca_dataset['Well No.'] = index
# Atur kolom "Well No." sebagai indeks kolom
pca_dataset = pca_dataset.set_index("Well No.")
# Drop kolom 'index'
pca_dataset.drop(['index'], axis = 1, inplace = True)
--------------------------------------------------------------------------- KeyError Traceback (most recent call last) <ipython-input-12-0d2629202e6b> in <module> 11 12 # Drop kolom 'index' ---> 13 pca_dataset.drop(['index'], axis = 1, inplace = True) ~\anaconda3\lib\site-packages\pandas\core\frame.py in drop(self, labels, axis, index, columns, level, inplace, errors) 4161 weight 1.0 0.8 4162 """ -> 4163 return super().drop( 4164 labels=labels, 4165 axis=axis, ~\anaconda3\lib\site-packages\pandas\core\generic.py in drop(self, labels, axis, index, columns, level, inplace, errors) 3885 for axis, labels in axes.items(): 3886 if labels is not None: -> 3887 obj = obj._drop_axis(labels, axis, level=level, errors=errors) 3888 3889 if inplace: ~\anaconda3\lib\site-packages\pandas\core\generic.py in _drop_axis(self, labels, axis, level, errors) 3919 new_axis = axis.drop(labels, level=level, errors=errors) 3920 else: -> 3921 new_axis = axis.drop(labels, errors=errors) 3922 result = self.reindex(**{axis_name: new_axis}) 3923 ~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in drop(self, labels, errors) 5280 if mask.any(): 5281 if errors != "ignore": -> 5282 raise KeyError(f"{labels[mask]} not found in axis") 5283 indexer = indexer[~mask] 5284 return self.delete(indexer) KeyError: "['index'] not found in axis"
# Tampilkan 10 baris pertama
pca_dataset.head(10)
| principal component 1 | principal component 2 | principal component 3 | |
|---|---|---|---|
| Well No. | |||
| Well 1 | -0.257410 | 0.010241 | -0.014850 |
| Well 2 | -0.319013 | -0.005766 | -0.066692 |
| Well 3 | -0.013475 | 0.264365 | -0.042363 |
| Well 4 | -0.243245 | 0.086584 | 0.054877 |
| Well 5 | -0.501030 | -0.042296 | 0.031679 |
| Well 6 | -0.064880 | 0.156129 | -0.071152 |
| Well 7 | -0.351643 | 0.063840 | 0.042337 |
| Well 8 | -0.408143 | 0.024900 | 0.116175 |
| Well 9 | -0.682235 | -0.119085 | 0.148956 |
| Well 10 | -0.182367 | 0.169381 | 0.067104 |
final_dataset = df.join(pca_dataset, how="inner")
final_dataset.head(10)
| Easting (ft) | Northing (ft) | X1 | X2 | X3 | X4 | principal component 1 | principal component 2 | principal component 3 | |
|---|---|---|---|---|---|---|---|---|---|
| Well No. | |||||||||
| Well 1 | 12561 | 8558 | 0.036274 | 0.396365 | 0.141577 | 0.392857 | -0.257410 | 0.010241 | -0.014850 |
| Well 2 | 11955 | 10140 | 0.009205 | 0.234249 | 0.150538 | 0.482143 | -0.319013 | -0.005766 | -0.066692 |
| Well 3 | 9527 | 8273 | 0.124026 | 0.634137 | 0.057348 | 0.625000 | -0.013475 | 0.264365 | -0.042363 |
| Well 4 | 9845 | 8810 | 0.094571 | 0.436474 | 0.062724 | 0.392857 | -0.243245 | 0.086584 | 0.054877 |
| Well 5 | 9165 | 10140 | 0.002163 | 0.107838 | 0.062724 | 0.339286 | -0.501030 | -0.042296 | 0.031679 |
| Well 6 | 9770 | 6170 | 0.096668 | 0.517893 | 0.134409 | 0.607143 | -0.064880 | 0.156129 | -0.071152 |
| Well 7 | 12485 | 6170 | 0.023291 | 0.390041 | 0.032258 | 0.321429 | -0.351643 | 0.063840 | 0.042337 |
| Well 8 | 7205 | 8810 | 0.060025 | 0.330878 | 0.014337 | 0.250000 | -0.408143 | 0.024900 | 0.116175 |
| Well 9 | 6525 | 10141 | 0.000000 | 0.000000 | 0.000000 | 0.142857 | -0.682235 | -0.119085 | 0.148956 |
| Well 10 | 4565 | 8810 | 0.129695 | 0.506445 | 0.019355 | 0.446429 | -0.182367 | 0.169381 | 0.067104 |
Elbow plot memudahkan kita untuk mengetahui jumlah k yang optimal berdasarkan jumlah fitur yang dipertimbangkan dalam kluster. Ibarat lengan yang sedang ditekuk, k yang optimal terletak diantara 2 trend garis (sebagai penghubung antara 2 trend yang berbeda). Untuk studi kasus ini, penentuan k optimal berdasarkan fitur dari 3 principal component yang telah dianalisis sebelumnya (PC1, PC2, PC3) dan fitur dari dataset asli (x1, x2, x3, x4).
sse = []
k_range = range(1,10)
for k in k_range:
km = KMeans(n_clusters = k)
km.fit(final_dataset[['principal component 1', 'principal component 2', 'principal component 3']])
sse.append(km.inertia_)
# Plot
plt.xlabel('K')
plt.ylabel('Sum of squared error')
plt.plot(k_range, sse)
[<matplotlib.lines.Line2D at 0x199da4e5fa0>]
sse2 = []
k_range2 = range(1,10)
for k in k_range2:
km = KMeans(n_clusters = k)
km.fit(final_dataset[['X1', 'X2', 'X3', 'X4']])
sse2.append(km.inertia_)
# Plot
plt.xlabel('K')
plt.ylabel('Sum of squared error')
plt.plot(k_range2, sse2)
[<matplotlib.lines.Line2D at 0x199da54bb50>]
Dari 2 elbow plot tersebut, dapat terlihat bahwa K-optimal = 2. Namun, untuk studi kasus ini, digunakan K = 3 dan K = 4 sebagai bahan perbandingan.
clusters = [2, 3, 4]
for cluster in clusters:
print('-'*100)
km = KMeans(n_clusters = cluster, random_state = 0).fit(final_dataset[['principal component 1', 'principal component 2', 'principal component 3']])
# define the cluster centers
cluster_centers = km.cluster_centers_
C1 = cluster_centers[:, 0]
C2 = cluster_centers[:, 1]
C3 = cluster_centers[:, 2]
# create a new plot
fig = plt.figure()
ax = Axes3D(fig)
# take pca dataset in this example
x = final_dataset['principal component 1']
y = final_dataset['principal component 2']
z = final_dataset['principal component 3']
# define the axes labels
column_names = final_dataset.columns
ax.set_xlabel(column_names[6])
ax.set_ylabel(column_names[7])
ax.set_zlabel(column_names[8])
# create a new plot
ax.scatter(x, y, z, c = km.labels_.astype(float), cmap = 'cool')
ax.scatter(C1, C2, C3, marker = "x", color = 'r')
plt.title('Visualisasi data dengan {} kluster'.format(cluster), fontweight = 'bold')
plt.show()
----------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------
clusters = [2, 3, 4]
y_predicted = []
for cluster in clusters:
kmeans = KMeans(n_clusters = cluster, random_state = 0).fit_predict(final_dataset[['principal component 1', 'principal component 2', 'principal component 3']])
y_predicted.append(kmeans)
y_predicted
[array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1,
1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,
0, 1, 1, 0]),
array([2, 0, 2, 2, 0, 2, 0, 0, 0, 2, 0, 0, 1, 1, 1, 1, 2, 2, 2, 1, 1, 1,
1, 1, 1, 1, 2, 2, 1, 0, 0, 2, 2, 2, 0, 0, 2, 2, 2, 2, 0, 1, 2, 0,
0, 1, 1, 2]),
array([2, 0, 2, 2, 0, 2, 0, 0, 0, 2, 0, 0, 1, 3, 3, 3, 2, 2, 2, 1, 3, 1,
1, 3, 1, 1, 2, 2, 1, 0, 0, 2, 2, 2, 0, 0, 2, 2, 2, 2, 0, 1, 2, 0,
0, 1, 1, 2])]
column_names = {'X1': 'norm_X1', 'X2': 'norm_X2', 'X3': 'norm_X3', 'X4': 'norm_X4'}
# Ganti nama kolom
final_dataset = final_dataset.rename(columns = column_names)
final_dataset['cluster with k=2'] = y_predicted[0]
final_dataset['cluster with k=3'] = y_predicted[1]
final_dataset['cluster with k=4'] = y_predicted[2]
final_dataset
| Easting (ft) | Northing (ft) | norm_X1 | norm_X2 | norm_X3 | norm_X4 | principal component 1 | principal component 2 | principal component 3 | cluster with k=2 | cluster with k=3 | cluster with k=4 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Well No. | ||||||||||||
| Well 1 | 12561 | 8558 | 0.036274 | 0.396365 | 0.141577 | 0.392857 | -0.257410 | 0.010241 | -0.014850 | 0 | 2 | 2 |
| Well 2 | 11955 | 10140 | 0.009205 | 0.234249 | 0.150538 | 0.482143 | -0.319013 | -0.005766 | -0.066692 | 0 | 0 | 0 |
| Well 3 | 9527 | 8273 | 0.124026 | 0.634137 | 0.057348 | 0.625000 | -0.013475 | 0.264365 | -0.042363 | 0 | 2 | 2 |
| Well 4 | 9845 | 8810 | 0.094571 | 0.436474 | 0.062724 | 0.392857 | -0.243245 | 0.086584 | 0.054877 | 0 | 2 | 2 |
| Well 5 | 9165 | 10140 | 0.002163 | 0.107838 | 0.062724 | 0.339286 | -0.501030 | -0.042296 | 0.031679 | 0 | 0 | 0 |
| Well 6 | 9770 | 6170 | 0.096668 | 0.517893 | 0.134409 | 0.607143 | -0.064880 | 0.156129 | -0.071152 | 0 | 2 | 2 |
| Well 7 | 12485 | 6170 | 0.023291 | 0.390041 | 0.032258 | 0.321429 | -0.351643 | 0.063840 | 0.042337 | 0 | 0 | 0 |
| Well 8 | 7205 | 8810 | 0.060025 | 0.330878 | 0.014337 | 0.250000 | -0.408143 | 0.024900 | 0.116175 | 0 | 0 | 0 |
| Well 9 | 6525 | 10141 | 0.000000 | 0.000000 | 0.000000 | 0.142857 | -0.682235 | -0.119085 | 0.148956 | 0 | 0 | 0 |
| Well 10 | 4565 | 8810 | 0.129695 | 0.506445 | 0.019355 | 0.446429 | -0.182367 | 0.169381 | 0.067104 | 0 | 2 | 2 |
| Well 11 | 4565 | 6170 | 0.059031 | 0.326475 | 0.014337 | 0.214286 | -0.427257 | 0.006467 | 0.132051 | 0 | 0 | 0 |
| Well 12 | 7205 | 7500 | 0.116425 | 0.308382 | 0.050896 | 0.357143 | -0.327525 | 0.037954 | 0.104189 | 0 | 0 | 0 |
| Well 13 | 5845 | 3500 | 0.448775 | 0.728204 | 0.340502 | 0.767857 | 0.399489 | 0.125190 | 0.062541 | 1 | 1 | 1 |
| Well 14 | 6029 | 4860 | 0.586767 | 0.830438 | 0.125448 | 1.000000 | 0.528360 | 0.442268 | 0.129946 | 1 | 1 | 3 |
| Well 15 | 3205 | 3530 | 0.693905 | 0.941238 | 0.283154 | 0.767857 | 0.614910 | 0.235907 | 0.264718 | 1 | 1 | 3 |
| Well 16 | 3255 | 4860 | 0.983387 | 1.000000 | 0.534050 | 0.803571 | 0.926630 | 0.061342 | 0.402751 | 1 | 1 | 3 |
| Well 17 | 4565 | 2220 | 0.193242 | 0.553839 | 0.105018 | 0.535714 | -0.042917 | 0.155905 | 0.047474 | 0 | 2 | 2 |
| Well 18 | 7280 | 620 | 0.109744 | 0.472100 | 0.070251 | 0.375000 | -0.219850 | 0.083011 | 0.070069 | 0 | 2 | 2 |
| Well 19 | 7205 | 2316 | 0.149997 | 0.413578 | 0.498208 | 0.392857 | -0.021867 | -0.278149 | -0.035709 | 0 | 2 | 2 |
| Well 20 | 8559 | 3952 | 0.557144 | 0.681931 | 0.829749 | 0.553571 | 0.562519 | -0.394195 | 0.098533 | 1 | 1 | 1 |
| Well 21 | 8485 | 3530 | 1.000000 | 0.679369 | 0.569892 | 0.482143 | 0.623406 | -0.224038 | 0.579891 | 1 | 1 | 3 |
| Well 22 | 8485 | 4860 | 0.556915 | 0.723961 | 0.792115 | 0.660714 | 0.616873 | -0.298834 | 0.057697 | 1 | 1 | 1 |
| Well 23 | 9770 | 3530 | 0.337111 | 0.828917 | 0.713262 | 0.714286 | 0.555297 | -0.171424 | -0.132504 | 1 | 1 | 1 |
| Well 24 | 10995 | 4860 | 0.653853 | 0.980226 | 0.534050 | 0.678571 | 0.696591 | 0.001060 | 0.189273 | 1 | 1 | 3 |
| Well 25 | 10985 | 3477 | 0.332251 | 0.822752 | 0.426523 | 0.821429 | 0.461276 | 0.112537 | -0.093802 | 1 | 1 | 1 |
| Well 26 | 9845 | 2220 | 0.226161 | 0.595068 | 0.569892 | 0.678571 | 0.282477 | -0.143848 | -0.141896 | 1 | 1 | 1 |
| Well 27 | 9633 | 247 | 0.033112 | 0.340165 | 0.204301 | 0.767857 | -0.092197 | 0.119982 | -0.203768 | 0 | 2 | 2 |
| Well 28 | 12560 | 890 | 0.031890 | 0.463374 | 0.301075 | 0.607143 | -0.048710 | 0.003417 | -0.172880 | 0 | 2 | 2 |
| Well 29 | 12485 | 2220 | 0.208459 | 0.806020 | 0.534050 | 0.839286 | 0.449917 | 0.029838 | -0.236956 | 1 | 1 | 1 |
| Well 30 | 17765 | 8810 | 0.007179 | 0.101993 | 0.032258 | 0.000000 | -0.669392 | -0.181235 | 0.200982 | 0 | 0 | 0 |
| Well 31 | 16405 | 10140 | 0.016417 | 0.179009 | 0.050179 | 0.357143 | -0.451187 | -0.001032 | 0.033074 | 0 | 0 | 0 |
| Well 32 | 18405 | 10140 | 0.040110 | 0.618125 | 0.247312 | 0.642857 | 0.034442 | 0.113728 | -0.178826 | 0 | 2 | 2 |
| Well 33 | 15065 | 8810 | 0.012369 | 0.354896 | 0.086022 | 0.500000 | -0.271040 | 0.094025 | -0.062265 | 0 | 2 | 2 |
| Well 34 | 14445 | 10140 | 0.033987 | 0.494596 | 0.103943 | 0.607143 | -0.123589 | 0.174712 | -0.111230 | 0 | 2 | 2 |
| Well 35 | 18180 | 5580 | 0.001497 | 0.078777 | 0.008961 | 0.232143 | -0.591883 | -0.058687 | 0.099687 | 0 | 0 | 0 |
| Well 36 | 17020 | 7500 | 0.007507 | 0.245697 | 0.113620 | 0.321429 | -0.403331 | -0.048564 | 0.016042 | 0 | 0 | 0 |
| Well 37 | 13765 | 6170 | 0.130358 | 0.743816 | 0.139785 | 0.750000 | 0.148006 | 0.291444 | -0.129995 | 0 | 2 | 2 |
| Well 38 | 15125 | 6170 | 0.028745 | 0.503242 | 0.093190 | 0.571429 | -0.142443 | 0.169319 | -0.096587 | 0 | 2 | 2 |
| Well 39 | 16555 | 3530 | 0.158864 | 0.475382 | 0.139785 | 0.553571 | -0.080168 | 0.111517 | 0.006712 | 0 | 2 | 2 |
| Well 40 | 16075 | 4860 | 0.051145 | 0.616604 | 0.039427 | 0.625000 | -0.067943 | 0.274695 | -0.095359 | 0 | 2 | 2 |
| Well 41 | 17915 | 4860 | 0.003222 | 0.070371 | 0.028674 | 0.053571 | -0.666989 | -0.162760 | 0.177144 | 0 | 0 | 0 |
| Well 42 | 13765 | 4860 | 0.262369 | 0.729966 | 0.534050 | 0.678571 | 0.360475 | -0.072072 | -0.112364 | 1 | 1 | 1 |
| Well 43 | 15125 | 2220 | 0.083778 | 0.523257 | 0.462366 | 0.678571 | 0.120070 | -0.076298 | -0.219110 | 0 | 2 | 2 |
| Well 44 | 14552 | 890 | 0.020380 | 0.282443 | 0.462366 | 0.196429 | -0.266411 | -0.382180 | -0.030257 | 0 | 0 | 0 |
| Well 45 | 15535 | 225 | 0.010993 | 0.232487 | 0.354839 | 0.196429 | -0.350784 | -0.309948 | 0.000517 | 0 | 0 | 0 |
| Well 46 | 16405 | 890 | 0.105630 | 0.747578 | 1.000000 | 0.750000 | 0.547212 | -0.411061 | -0.424106 | 1 | 1 | 1 |
| Well 47 | 17765 | 890 | 0.097427 | 0.690657 | 0.534050 | 0.767857 | 0.296974 | -0.039162 | -0.286007 | 1 | 1 | 1 |
| Well 48 | 17765 | 2220 | 0.070099 | 0.568649 | 0.354839 | 0.625000 | 0.064001 | 0.000878 | -0.175739 | 0 | 2 | 2 |
# Export dataset ke dalam format excel
final_dataset.to_excel('..\excel\Result.xlsx')
# Visualisasi data dalam 2 kluster
fig7 = px.scatter(final_dataset, x = "Easting (ft)", y = "Northing (ft)", color = "cluster with k=2")
fig7.update_layout(title = "Visualisasi data dalam 2 kluster")
fig7.show()
# Visualisasi data dalam 3 kluster
fig7 = px.scatter(final_dataset, x = "Easting (ft)", y = "Northing (ft)", color = "cluster with k=3")
fig7.update_layout(title = "Visualisasi data dalam 3 kluster")
fig7.show()